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ABSTRACT 

This study examines the hypothesis that the accuracy 
of teachers' evaluations of their students is higher if (1) the 
teacher and the student are of the same race» lod (2) the teacher and 
the student are of the same sex. Data w^i v collected in the 1969 
Survey on Compensatory Education, a surve;v of rational scope 
sponsored by the U.S. Office of Education^ Thv^ survey secured data 
from school superintendents, elementary school principals, and 
elementary school teachers. Teachers answered tjuestions about 
themselves and their classes in responding to a "Teacher 
Questionnaire," and answered questions about individual pupils in 
their classes in responding to several "Pupil Questionnaires." Survey 
data were secured from a nationally-representative sample of public 
school systems and elementary schools which provided services, 
supported in whole or in part through Title I of the Elementary and 
Secondary Education Act of 1965, during the 1968-69 school year. Only 
students and teachers from grade 4 were considered in the present 
study. The results of this study suggest that much of the literature 
on the topic of race and social distance may not be applicable to 
classroom teachers. "White female teachers rating blades" were most 
accurate, and "black teachers rating. blacks" were least accurate. The 
most accurate evaluations were made by female teachers and by whites 
rating other whites. (Author/JM) 
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RACE AND SEX AS CONCOMITANTS OF TEACHERS* 
ACCURACY IN EVALUATIVE BATING OF 
STUDENTS 1 



Richard M, Jaeger 
Tom D. Freljo 
University of South Florida 

INTRODUCTION 

The Research Hypothesis 

It has been veil docuiaented (McDonald, 1965; Davis, 1964) that accurate 
evaluation of students^ strengths and weaknesses Is fundamental to effective 
teaching. Lacking accurate evaluation, a teacher cannot determine the 
effectiveness of past Instruction, nor can (s)he determine needs for remddl- 
atlon. 

While teachers gather some evaluative information through objective 
devices such as standardized achievement tests^ a great deal of evaluation 
depends upon less formal data-collection procedures (Thorndike and Hagen, 
1969). Teachers often make prescriptive decisions for students on the basis 
of unquantified observations and ratings, for example. It is the latter kinds 
of evaluations that this study concerns. 

This study examines the hypothesis that the accuracy of teachers* evalu- 
ations of thier students is higher if 

(1) the teacher and the student are of the same race, 

and 

(2) the teacher and the student are of the same sex. 

f 
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The term '^accuracy of evaluation'* Is defined quite specifically. The defini- 
tion can be motivated by describing a hypothetical situation. Suppose that a 
large group of teachers of both sexes and different races provides ratings of 
students^ progress on several school skills and social behaviors. The rated 
group of \attLidents is also large, and consists of both sexes and several races. 
Suppose further that the school skills and social behaviors on which students 
are rated» while not coinpletely uncor related, show considerable Independence 
of development in the population. 

An '^accurate evaluation" Is then characterized as one in which a teadier 
considers each rated behavior Independently of every other rated behavior, 
and an '^inaccurate evaluation" as one in which a student is rated holistically , 
without regard to her/his differences across rated behaviors* This kind 
of inaccuracy in rating was recognised as a serious problem by Wlllingham 
and Jones (1938), and labeled "composite halo". 

Given the situation described above, the following compound hypothesis 
is tested: The accuracy of teachers^ evaluations of students* progress d epends 
upon the race and sex of the teacher and the race and sex of the student . More 
apeclflcally, the following ordering for teachers* evaluations of students* 
progress is hypothesized (from iiK>st accurate to least accurate): 

(1) Teachers evaluating students of the same race, 

(2) black female teachers evaluating white students, 

(3) black male teachers evaluating white students, 

(4) white male teachers evaluating black students, 

(5) wiiite female teachers evaluating black students. 

This hypothesized ordering is consistent with a large body of sociological 
literature, much of which is reviewed below. 

One can j^eneralize further from previous research, and hypothesize an 

O 
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ordering on accuracy of rating for all combinations of sex and race of teacher 
and sex and race of student. The additional hypotheses are based on extrapo- 
lations of specific research findings; some refutations of these hypotheses 
could therefore be expected. The conqilete orderinp is as follows (from most 
accurate ratings to least accurate) : » 

(1) Teachers evaluating students of the same race and sex, 

(2) Teachers evaluating students the same race 

(3) White teachers evaluating white students 

(4) Black teachers evaluating black students 

(5) Teachers evaluating students of the same race but opposite sex 

(6) Black female teachers evaluating white students 

(7) Black male teachers evaluating white male students 

(8) Black male teachers evaluating white female students 

(9) White male teachers evaluating black students 

(10) White female teachers evaluating black students. 

Supporting Research 

While there appears to be no previous research on the relationship 
between classroom teachers* evaluation accuracy and the sex and race of 
teachers and students* there is a considerable body of literature for more 
general populations that relates the similarity between evalu.itors and those 
being evaluated, and evaluation accuracy. The literature appears to support 
two contentions. First, evaluative raters show greater Interest toward 
ratees similar to themsleves (Kagan, 1967); Interpret their perceptions of 
ratees in terms of their self-perceptions (Stagner. 1948); are more sensitive 
to those characteristics, of ratees vAiich conform to their self-percentions 
(Femsterheim and Tresselt, 1953); and better recall those characteristics 
of ratees wiilch conform to thier own personalities (McLaughlin, 1970). 
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Second, accuracy of rating is positively reUivid to the similarity of the 
rater and the ratee; the more similar the two, the iwore accurate the ratinR. 
McLaughlin (1970) found thajl ratee^ were given higher ratings by similar 
raters than by dissimilar tjaters. Christensen (1970) found that ratees were 
given more accurate rating^ by similar raters than bv dissimilar raters. 

The research revlewedj above suggests that all teachers, regardless of 
sex or race, might be expected to rate students of their wn sex and race 
more accurately than theyjdo those of different sex and/or race. In addition, 
one can hypothesize a ran|t ordering on teachers* accuracy of student evalua- 
tion, which depends on tl^e sex and race of teachers in relation to those of 
the students being ratedl Several studies which base their findings on the 
concept of social distaijlce support the hypothesized ranking given above. 

Bogardus developed^' a scale in 1926 for assessing the relative perceived 
social distance betwee^ groups. The scale has been used by many investigators 
to study the relative /perceived distance between groups of differing nation- 
ality and race. One jot the most consistent findings of these studies is the 
large perceived distance between whites and blacks in the United States (Kock, 
1946; Bogardus, IBsj; Meltzer, 1941; Ames, 1968; Hines, 1968). Several studies 
considered the rela(tionship between sex and social distance. Bogardus (1969), 
for example, found that women perceived themselves to be considerably more 
distant from tho^e of different race and nationality than did men. Ames (1968) 
and Landis (196^) confirmed Bogardus findings. Research supports the contention 
that the perceived social distance of whites from blacks is larger than the 
perceived social distance of blacks from whites. Kock (1946) found that reci- 
projal feelings of distance from children of the other race existed among both 
white students and black students. throughout their school years. She found 
further that the perceived distance of white students from blacks was larger 
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than the perceived distance of black students from whites. Bogardus (1958) 
found the same relationships to exist among black and white college students and 
college graduates, between the ages of 18 and 35. These studies were consistent 
in their finding that whites place blacks at the extreme end of the social 
distance scale, while blacks view themselves as less socially distant from 
whites. The reported findings represent averages, of course, and individual 
differences exist within each racial group. Hlnes (1968) found that blacks 
rated various national-origin and racial groups in the following order on 
a social distance scale (from closest to farthest): Anglos. Mexican-Americans, 
American- Indians. He also f ound^ that among blacks over 21. women rated them- 
selves as significantly closer to whites than did men. 

Hiese results support the hypothesis that blacks, because of their per- 
ceive, social "closeness" to whites, will rate whites more accurately than 
whites will rate blacks. Chrlstensen (1970) presents further evidence to 
^substantiate this hypothesis. Ke compared blacks' ratings of whites and 
whites' ratings of blacks with corresponding self-ratings on a number of 
personality traits. Chrlstensen found that black raters were just as accurate 
as white raters In rating whites, but white raters were far less accurate 
than black raters in rating blacks. 

Sigglflcance of the Research 

According to McDonald (1965) teaching is a decision-making process In 
which the essentials of the educative act consist of (1) formulation of the 
goals of the learning experience. (2) developaent of a plan for the Instruc- 
tional strategy and (3) formulation of a plan for evaluatinp the effect of 
the strategy. Simiiarly, Newell. Shaw and Simon (1958) present a decision- 
making modal which can be applied to teaching. Their model highlights the 



ftiTidaoental Importance of accurate evaluation in the formulation of educational 
goals and the development of instructional strategies. McDonald stresses the 
importance o€ accuracy in teachers* perceptions of students* strengths and 
weaknesses: 

"Such inaccuracies in our perception of pupils will probably affect 
their development adversely because of the way we will treat them. 
Inaccurate perceptions will also interfere with planning of 
appropriate learning experiences. We might be handicapped in our 
understanding of the goals likely to motivate a particular student; 
we might err also In evaluating the factors likely to stimulate 
pupil change or In our estimates of the factors inhibiting chaiRC." 
McDonald, p. 532. 

Ojeoann and Wilkinson (1939) reported the effects on student growth 
of providing teachers with additional, accurate information about students. 
At the end of a one-year experimental study, the investigators found that 
students of teachers given additional information had significantly higher 
adjustment, than did students of teachers with less information. In a 
similar but more recent study, Hoyt (1955) found that providing teachers 
with additional Information on the characteristics of their students had 
no effect on student achievement, but did result in positive changes in 
student attitudes. These studies show that the amount of accurate Information 
a teacher has on (her) his students does affect the students in Important 
ways. As noted by Tagiuri and Petrullo (1957), the teacher*s perception of 
students* characteristics is Important information in defining the educational 

problem to be solved. 

It would seem therefore, that determining the extent to which teachers of 
one race or sex fall to make accurate judgments of the educational pre r-ress 



of students of another race or sex Is a significant subject of inquiry. It 
would fail to be so, only if the racial and sexual isolation of teachers and 
students in our schools was nearly complete, and was likely to concinue to 
be so. 

PROCEDURES 

Data Source 

Data used to investigate the hypothesis that the accuracy of teachers' 
evaluations ol' student progress depends upon the race and sex of the teacher 
and the race and sex of the student were collected in the 1969 Survey on 
Coapensatory Education; a survey of national scope sponsored by the U.S. 
Office of Education. 

The 1969 Survey on Compensatory Education (hereafter called the Survey) 
secured data froa school superintendents, elenentary school principals, and 
eleaentary school teachers. Teachers answered questions about themselves 
and their classes in responding to a "Teacher Questionnaire", and answered 
questions about Individual pupils in their classes in repsonding to several 
"Pupil Questionnaires". Survey data were secured from a nationally-repre- 
sentative sample of public school systems and elementary schools which provided 
services, supported in whole or in part through Title I of the Elementary 
and Secondary Education Act of 1965, during the 1968-69 school year. In 
addition to a Pupil Questionnaire and a Teacher Questionnaire, the Survey 
used a Principal Questionnaire and a School District Questionnaire. In total, 
172 multi-part questions were asked on these questionnaires; 71 on the Pupil 
Questionnaire, 38 on the Teacher Questionnaire, 4A on the Principal Question- 
naire and 19 on the School District Questionnaire. All of the data secured 
through the four questionnaires can be linked, in the sense that responses 




for a student can be tied uniquely to responses for her/his teacher , 
her/his principal and her/his school district. 

The Instrianent 

Five questions froa the 1969 Survey on Compensatory Education — two from 
^the Teacher Questionnaire and three from the Pupil Questionnaire— were used 
In conductin]?! the research reported here. Questions used are as follows: 
^ Question 2 ovi the Teacher Questionnaire: 
What is your sex? 
i!^>fale 
(^Female 

Question 6 on the Teacher Questionnaire: 
< 

Are you a neniber of one of the national minority groups 
(Racial, or national origin groups which are a tolnorlty 
of the national population. ) listed below? 
iTes 

^ If yes, please Indicate which one: 
i^Amerlcan Indian 
#Negro 
fOrlental 

^I^Spanish-sumaaed Aoisrican (Persons of Cuban Descent, 
Mexican Descent* Puerto Rican Descent, Spanish Descent) 
Question 2 on the Pupil Questionnaire: 
What is this pupil's sex? 

^Female 



Question 10 on the Pupil Questionnaire: 

Is this pupil a nenber of any of the following national tnlnorlty 
groups? (Racial or national origin groups which are a minority 
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of the national population.) 
lYes 

If yes, which one? . 
fAmerican Indian 
#Ifegro 
^Oriental 

iS^Spanlsh-sumaiaed Ai&erican (Persons of Cuban Descent, Mexican 
Descent, Puerto Rlcan Descent, Spanish Descent) 
Question 41 of the Pupil Questionnaire: 

Please Indicate the change in this pupil *s academic performance 
and behavior since you first became his teacher during the 1968-69 
school year. Rate this pupil on each item listed [below], caking 
into consideration how he performed when you first because his 
teacher this school year and how he performs now. (Assume that 
the school year started in the Pall of 1968 and does not include 
a sunner session.) 

(Note: On the questionnaire, the item is arranged in matrix form. 
There are six possible options for each of twenty -one performances 
and behaviors. The options, and the performances and behaviors 
are listed below.) 
Item options: 

if Large change for the better 

id Some change for the better 
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# No change (change desirable) 

it No change (change noe necessary) 

# Sone change for Che worse 

# Behavior not observed 
Pupil perfornance and* behaviors: 

9 Care In handling school property 

it Responsibility in coopletlng class assignments 

it Attentlveness In class 

# Creativity 

# Relationships with adults 

# Relationships with other pupils 
I Aaount of disruptive behavior 

# Understanding oral Instructions 

# Accuracy In self evaluation 
< it Self concept 

# Drrss habits 

# Anxiety 

I Attendance 

# Reading proficiency 
it Math proficiency 

m 

it Oral expression 

# Awareness of current affairs 
it Educational aspirations 

it Liking for his teacher 
it Independent learning 

# Understanding written instructions 

Questions 2 ^and 6 on the Teacher Questionnaire and questions 2 and 10 
on the Pupil Questionnaire were used as classification variables. That is, 

er|c . ■ . 
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ttey were used to establish linked files of students and teachers of both 
sexes and two racial groups. A Likert scale was imposed on question 41 on 
the Pupil Questionnaire, in order to quantify teachers' responses. "Large 
change for the better" was scaled as "5", "Some change for the better" was 
scaled as "4'\ "No change (change desirable)" was scaled as "3". "No change 
(change not necessary)" was scaled as "2", "Some change for the worse" was 
scaled as "1", and "Behavior not observed" ratinRS w^re treated as missing 
datar 

The Sample 

The Survey Sample , Since the data that were available for the present 
research are prescribed by the design of the 1969 Survey sample, its struptwre 
will be described first. The primary sampling units used in the Survey 
were school districts with .enrollments exceeding 299 students that recei'^d 
funds under Title I, ESEA during the 1968-69 school year. A sample of 438 
school districts was selected from a universe of 9236, using systematic 
random sampling within each of four enrollment strata. Sizes of district 
samples within the enrollxaent strata were determined using Neyman (1938) 
optimal allocation. Enrollment boundaries, population sizes and sample sizes 
for the four strata were as follows: 

EnrolXinent within Number of Title I' Sample 

Stratum School District Districts in Stratum Size 

1 40,000 or more 

2 9,000 to 39,999 

3 3,000 to 8.999 

4 300 to 2,999 

TOTALS 



92 
658 
1917 
6569 
9236 



91 
124 
121 
102 
438 



12 



Saispllng of schools vithln selected districts was, with slight modifica- 
tion, acconplished using a systematic random prucedure with a sampling frac- 
tion of 1:1.4* The sampling frame used within each selected district was 
a lint of all schools with at least one of grades 2, 4, cr 6 that provided 
oervices supported in whole or in part under Title 1, ESEA. 

Within sampled schools, all principals and all teachers in grades 2, 4 
and 6 were sent survey questionnaires, provided at least 15 pupils were en- 
rolled in a sampled grade. Sampled teachars were asked to complete question- 

\ 

\ 

naires for themselves and for a aanple of 3 to 6 student^ in their classes. 

\ 

Each sampled teacher was provided with a precise systematic sampling procedure 
for the selection of ^tudejits within her/his class. 

This sampling procedure resulted in the selection of 2,920 schools, 
22,067 teachers, and 104,036 students in grades 2, 4 and 6. The Survey response 
rates exceeded ninety percent. 

« 

The Research Sample * While the entire sample of teachers and students avail 
able from the 1969 Survey could have been used in the study, to do so would have 
been unnecessarily wasteful. There was no desire in the present research to esti 
mate totals for the national population, as was true in the 1969 Survey. Only 
students and teachers from grade 4 were considered in the present study. 

The procedure used for sampling the available data was based upon two 
factors. First, the essential randomness of the data file was to be preserved. 
Since the data were arranged by schools within districts, districts within 
strata and strata within states, a systematic sampling procedure was used to 
assure proportional representation across these classification variables, 
while preserving randomness. Second, the size of the Survey data file afforded 
the opportunity to investigate the stability of the analytic findings using 
Jackknifc procedures as suggested by Miller (1968). 
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Prior to samplinR, the classification variables described above (ques- 
tions 2 and 6 on the Teacher Questionniare and 2 and 10 on the Pupil Ques- 
tionnaire) were used to create separate but linked files of teacher data and 
student data, classified by race and sex; sixteen files were thus created. 
Since it was found that only twenty-three white male students were rated by 
black raale teachers, and only fourteen white female students were rated by 
black -Bale teachers, these files were eliminated from further analyses. Tlie 
remaining fourteen files were then joined to compose groups of data, corres- 
ponding to eight of the ten hypotheses listed above (two hypotheses could not 
be investigated, since data for black male teachers rating white students 
were not available in sufficient quantity). The groups of data were then 
sampled systematically, with sampling fractions chosen so as to provide samples 
of approximately 1000 ratings of students by teachers. In two groups no addi- 
tional sampling was performed since the numbers of ratings available were 
either less than or close to 1000. The eight groups, the numbers of cases 
in each group, and the sampling fractions used are shown in Table 1. 



Insert Table 1 about here 



Because single classification variables were used to designate some 
groups and multiple classification variables were used to designate others, 
the groups listed in Table 1 are not mutually exclusive. For example, the 
grOv«o "Teachers rating students of the same race and sex" is a subset of the 
group "Teachets rating students of the same race". For purposes of analysis, 
it was necessary that the samples used be non-overlapping. Therefore, start- 
ing-points for the systematic sampling procedures were chosen so as to provide 
mutually exclusive samples. 
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Data Analysis 

Heuristic Discussion * The purpose of the procedures used for analysis 
of dfita was to examine the relative structural complexity of teachers' ratings 
of students, for white and black teachers of both sexes, and white and black 
students of both sexes. Put another way, it was assumed that variables such 
as "Care in handling school property", "Accuracy in self evaluation", and 
"Reading proficiency", although positively correlated across students, would 
exhibit some unique variation. It was hypothesized, for example, that a 
white teacher rating a white student would be better able to perceive the 
unique character of these variables than would a white teacher rating a black 
student. If hypotheses such as th ;e were correct, the result would be a more 
complex relational structure among the rated variables, when the race or sex 
of teacher and student were the same, than when the race or sex of teacher and 

student were* different. 

ThtM hypotheses were examined using two different analytic procedures: 
Principal components analysis (Kaiser, 1958) and non-metric aultidiaensional 
scaling (^iepard» 1962; Kruskal, 1964a). Each procedure was applied indepen- 
dently to teachers* ratings of students, when teachers and students were 
grouped by sex and race, as listed In Table 1. The details of these proce- 
dures are discussed below. 

Principal Components Analysis . Principal components analyses of teachers* 
ratings of students were completed independently, using the sampled data for 
each of the groups listed in Table 1. In tercor relations among the twenty- 
one rated student behaviors were computed first, and the resulting correlation 
matrix was then analysed through the prxncipal components algorithm provided 
by the BMD03M computer program (Dixon ^ 1965). 
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The criterion of rating accuracy used In this study was, as noted above, 
highly specialized. In particular, teachers* ratings were Judged to be more 
accurate, the lower the coaq>osite halo effect exhibited. For the principal 
coBponents analyses, degree of conposite halo effect was operationally defined 
by the proportion of standardised variance among the twenty-one dinensiona 
ratad, that was accounted for by the first principal conponent. Thus the lar- 
ger the variance attributable to a single factor, the larger the canposite 
halo effect. 

For purposes of cooparing the saaples of ratings of each of the groups 
listed in Table 1 then, it would have been sufficient to coapute the propor- 
tion of variance accounted for by the first principal conponent of their re- 



spective correlation ipatrices. A coaparisaa aaong populations represented 
by these sanples (ratjtier than the sas^les thenselves) were of primary inter- 
est, however. The analyses were therefore more complex. It was desired to 
formally test hypotheses on the pairwisc equality of compound halo effect 
for the populations represented by each pair of groups listed in Table 1. 
Since the sampling distribution of the proportion of variance accounted for 
by a single principal component is not known, approximation techniques were 
required. The procedure employed was the jackknife technique (Miller, 1968). 
By dividing sample data into overlapping subsets and computing estimates for 
each, the jackknife procedure provides an estimate of a population parameter 
of Interest, and in additon, provides an estimate of the variance of the 
estimator. These two estimates — the parameter estimate and its variance 
estimate — can be coabined throu^ standard statistical procedures to for- 
mally test hypotheses. 

Once jackknife estlmatfts of the proportion of variance accounted for by 
a single principal conponent and estimates of the variances of the jackknife 
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estlnates were conputed for each group listed in Table 1, pairwise conpari- 
sons were nade using Bonferroni t-statistics (Miller, 1966). The twenty- 
eight pairwise compartaons were made with an overall experimental error rate 
of five percent. 

Non-netric aultidiwensional scaling * The relative composite halo effect 
•aong ratings by the ei^t groups listed in Table 1 was also investigated using 
atultidimeniional scaling. The advantage of multidimensional scaling is its 
weak assumption on the neasure^nt properties of teachers* ratings. While 
the principal components procedure assumes an interval level of measurement 
for teachers* ratings of students, the multidimensional scaling procedure re- 
quire's only ordinal measurement. 

The operational definition of degree of composite halo effect used with 
multidimensional scaling was the 8i«e of the "stress" value resulting from 
an attempt to fit the relationships among the twenty-one rated variables 
into a tnidimensional space. Stress is a measure of badness of fit, as de- 
fined by the Kruskal multidimensioottl scaling algorithm (1964a). That is, 
the higher the stress value, the worse the fit of the data to a single dimen- 
sion (and concomitantly, the smaller the composite halo effect). 

The procedures used with multidimensional scaling were nearly idwitical 

to those described above for the principal components analyses. As in the 
principal compoments analyses, the infcercorrelations among the twenty-one rated 
student behaviors were first computed separately for each of the eigiht groups 
listed in Table 1. These correlations were treated as similarities among the 
twenty one variables, and were used as inputs to Kruskal *s MDSCAL IV computer 
program (1964b). 

Since the sampling distribution of the stress statistic is unknown, the 
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jackknife procedure was once again eaployed to test paixvise hypotheses of 
Identical coaposite halo effect aaong the eight groups listed in Table 1, 
Again, Bonferroni t-statistics were used to maintain an overall experimental 
error rate of five percent. 

RESULTS 

Tiro independent statistical techniques were used to test hypotheses in 
this study. Since the two techniques provided differing results, the findings 
for each procedure will be presented separately. 

Factor Analysis 

As defined in this study, an inaccurate evaluation is one which exhibits 
a large aaotnt of conposite halo effect. For factor analyses, the degree of 
coiyosite halo effect has been operationally defined as the proportion of 
standardised variance aaong rated behaviors accounted for by the first princi- 
pal coaponent. An accurate evaluation is therefore characterised by a factor 
analysis in which a relatively snail proportion of variance is accounted for 
by the first principal coaponent; an inaccurate evaluation is characterized 
by a factor analysis in which a relatively large proportion of variance is 
accounted for by the first principal coaponent. 

Ualng these definitions, the groups of teachers and students corresponding 
to the hypotheses of this study are listed in decreasing order of accuracy 
of avaluatiwa ratings in Table 2. For coalparative parposes, the hypothesized 
ordering is also listed. The hypothesised ordering of groups and the ordering 

Insert Table 2 about here. 
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fotsd through factor analysis differ considerably. To be consistent with 
the research cited above » the "Vhlce feaales rating blacks" group should 
have produced the least accurate evaluations. Conversaly, the "Blacks rating 
blacks" group should have produced the aost accurate evaluations. In fact» 
the evaluations provided by these groups vere directly opposite to the rankings 
hypothesised; "Uhlte feaales rating blacks" vere aost accurate, and "Blacks 
rating Hacks" vere least accurate. In terns of liidlvldual characteristics, 
the aost accurate evaluations vere made by feaale teachers and by whites rating 
other whites. Since a large percentage of the white teachers were feaale, the 
**Hhltes rating whites" group reflects a predominance of females that may 
outweli^ the racial classification. 

The eight groups listed In Table 1 give rlt^e to 28 palrwlse comparisons 
of evalmmtlon accuracy. When these comparisons were made, only two pairs of 
grottpa were found to differ significantly In the proportions of standardized 
variance accounted for by the first principal component. Ihe two significant 
differences (at the 0.05 level) were found between the "Uhlte females rating 
blacks" group vs. the "Blacks rating blacks" group and between the "Uhltes 
rating whites" group vs. the "Blacka rating blacks" group. For the "Blacks 
rating blacks" group, the first principal component accounted for a larger 
percentage of variance^ In both coBq>arisons , indicating theirs was the less 
accurate evaluation. The statistics associated with all 28 comparisons are 
listed in Table 3. 



Insert Table 3 about here 



Multidimensional Scaling 

In a multidimensional scaling analysis, a high degree of conpoaite halo 
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effect (Inaccurate ev&luatlon) 1» operationally defined by a relatively low 
value of stress when the relationship aaong behavioral ratings are napped into 
a single diaension. Therefore, the lower the stress value associated with a 
one-diasnsional solution, the less accurate the evaluation; conversely, the 
hi^r the stress value, the nore accurate the evaluation. 

Using this definition, the groups of teachers and students corresponding 
to the hypotheses of this study are listed in decreasing order of accuracy 
of evaxuative ratings in Table 4. tnce again, the hypothesised ordering is 
listed for conparative pospose.^. The hypothesized ordering of groups and the 
ordering found through aultidlaenslonal scaling are sooewhat different. Con- 
sistent with the factor analysis results cited above, the "White females rating 
blacks" group prodcued the nost accurate evaluations. The "Blacks rating 
blacks" group, which the theory predicted to be aost accurate, was aaong the 
least accurate. 



Insert Table 4 about here 



When aultidiaensional scaling results were used to sake pairwise coaparl- 
sons of evaluation accuracy for the eight groups listed in Table 1, significant 
differences were found for only two pairs of groups. The *1lhites rating whites" 
group was found to be significantly less accurate in its evaluations (at the 
0.05 level) than the "Hhite aales rating blacks" group. The "Whites rating 
whites" group was also found to provide significantly less accurate evaluations 
than did the "White feaales rating blacks" group. Statistical results for all 
28 pairwise coaparisons are ahown in Table 5. 



Insert Table 5 about here 
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COMCLUSIOMS 

Factor Aa»lysi8 
1. Overview 

lh« results of this fttudy do not agree with the results predicted by the 
cited sodologicsl literature. Hmrever, the roie of research is not Just to 
eubstantlate hypothesised relationships, but to Identify unexpected findings 
and reconcile these>lth established knowledge. To that end» a rationale that 
evlains the results of this study is developed in this section. The rationale 
is based upon three laain coatentions: (1) that teachers are different froa 
the population ou which aost of the perceived social distance research la based 
(2) that mle teacheri are different froa feaale teachera in ways that would 
affect their evaluation accuracy, and (3) that black teachers are different 
ftom uhite teachers in ways that would affect their evaluation accuracy. 

These three aain contentions are elaborated and supported with references 
to previous studies. An escplanatlon of the ordering of dyids obtained in this 
study, consistent with research on the sociology of teachers and teaching, is 
then presented. 

2. Contentions 

1) In reviewing the literature dealing with social distance, the re~ 
lative difference between the perceived distances of whites froa blacks and 
blacka froa whites was well established; however, the social distance studies 
reviewed did not restrict thler populations to teachers. There is evidence 
to substantiate the contention that teachers aay be different froa the general 
population (and even other college graduates) in ways which would affect their 
perceived social distances, because of two factors (1) selection and (2) 
training. Several studies have indicated that those who select teaching as 
s career pcsess a systea of values which places a high priority on the social 
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worth of others (Hason, et. al, 1959; BreiAeck. 1962; Rosenberg, et.al, 1957? 
Gottlieb, 1961; Spaeth, 1959). In thler training of teachers, wany colleges 
have as a high priority to sensitize prcapective teachers to the worth of 
individuals and the acceptance of individual differences (Splndler, 1968). 
Therefore .becaase of self -selection and specialized training, it nay be assuned 
that teachers exhibit different patterns of perceived social distance than the 
populace as a whole. If this were the case, the ralatlonships between sex, 
race and perceived soclAl distance hypothesised for the general population 
would not apply to the population of teachers. 

2) Another escplanatioa for the diserapeney betweaa the hypothesised and 
actual orderings of groups on accuracy of evaluation night be the presence 

of one or «ore confounding variablee which are differentially represented in 
the population of teachers, when categorized by race and sex. One such variable 
is socio-cconoBic statoi (S8S). A review of literature showed that more than 
one sociological study supports the contention that SES is nore laportant than 
race in detenlning perceived social distance; e.g., white aiddle-class persons 
have less perceived social distance froa niddle-clasa blacks than fron lower- 
class whites. (Gordon, 1964; Lasdis, 1966). 

3) Other potentially confounding variables are the levels of training 
and/or motivation of teachers represented in the different groups. For ex- 
aaple, it is well documented that nany aales enter dassrooa teaching with 
expectations of attaining positions in educational adwinistration while a very 
high percentage of feaales have no aspirations to aove froa dassrooa teaching 
to soae •niigher" position (Mason, et. al, 1959; Coloabof>s. 1962; Graebell and 
Olson, 1973). 

4) More than one sociologist has noted a phenoaenon exhibited by blacks 
who reach alddle-aass status. It has been asserted that becsase of the 
oppresion snd negative prejudice to which lower-class blacks are subjected, 
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blacks who reach aiddIe<-cla8S status pf ten disassociate the«8*4ves from lower- 
class blacks. Mbreover, aa expression o£ their disassociation is a level of 
criticisn of lower-class black* that exceeds the criticism exhibited by aiddle- 
class whites. Hentoff (1965) discusses this phenoaenon and cites several 
supporting sources. 

5) It has been proposed by several educators and sociologists that for 
nany decades blacks have been "trapped" in a self-perpefcsating and inferior 
educational system. It is daised that black teachers have been provided an 
inferior education, teadi predominantly black students, and Chat some of these 
students are trained to become teachers by black colleges, thus perpetuating 
^he system. This proposition was substsntiated by the survey and case studies 
mandated by th& Civil Ri^ts Act of 1964 (Coleman, et. al, 1966; USOE, 1966). 
If black teachers do receive inferior training, they would be expected to be 
less accurate in their evaluation of students. 

3. Explanation of VindiAgs 

If the contentions described and supported above are assumed to be true, 
they may provide a basis for explaining the^^ljrderlng of groups on accuracy 
of evaluation as, reported for the factor analysis procedure. 

Why would vhltm femSle teachers evaluating blacks provide the most ^ccurr 
ate evaluations? Teaching has long been a respectable terminal career for 
middle-class females. Teachers in this group rarely use their teaching position 
ss a stepping stone to a 'lilgher" position in the educational hierarchy. It 
is reasonable to assume then, that middle-class teachers are, as a group, 
interested in classroom teaching and are, for the most part, highly motivated. 
Their training, too, is probably specific to the classroom teacher position, 
in contrast to administratively-oriented males who may seek specific training 



23 



in «(iaialstration. If th« ability to evaluate effectively is one of the skills 
that defines good dassrooM teaching* it is reasonable to conclude that per- 
sona specifically trained for that role position, interested in the position, 
and highly aotlvated toirard effective functioning in the position, will eval- 
oata aora eifectiwly thon those who are not. Why than do black teachers 
exhibit 9ueh a high degree of covpoaite halo when rating black students (G^ou^ 
6), even though ninety percent of the *^lack teachera are feaale? There are 
two possible reasona: (1) The training recdlved by black feaale teachers Is 
probably inferior to the training received by white feaale teachers in the 
' najorlty of cases » and the avaluation accuracy" of black feaale teachers nay 
be reduced accordingly? (2) At the tiae the data reported here were collected, 
aoat black feaale teachi rs were teaching classes that were predoainantly or 
e3n:luiively black. A large proportion of low socio-eeononlc status students 
would be expected in these classes. It has been contended that blacks, upon 
reaching aiddle-dass status, often perceive theaselves to be socially 
distant *roa lower-class blacks. It is reasonable to assuae that this pheno- 
aenon would be exhibited by aiddle^-class black feaale teachers. A conbination 
of inferior tsaining and large perceived aocial distance, then, could explain 
the relative inaccuracy of black teachers in rating black students. 

Black. feaale teachers were relatively accurate in their evaluations of 
white students (Group 2). This result is not consistent with the g&nemi 
sociological theory discussed above » and requires further explanation. Again, 
two possibilities are suggested by the sodologf of the teaching profession: 
(1) the nuii>er of black feaale teachers who teach white students is far smaller 
than the nuiiber who teach black students. It is not unreasonable to assuae 
that black feaales who teach white students do so in predoainantly white schools 
sttd are placed in those schools because they are good teachers (Sexton, 1964). 
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(2) Slack feaale teachem aay feel no need to alienate theasleves froa lower- 
daaa vhlte studen;;8, although, aa haa been contended above, they do feel the 
a«ed to alienate theaMivaa from lover-claaa black studenta. It la also reason 
able to assiMM'that the socio««conoalc status of atudent bodies in predoal* 
nantly wiftte schools is higher than that in pred^aitfantly black achoola; .the 
pereaived social ''^st-tnce of black feaale-'tefcchers froa alddle-class white 
students would be v ..ler than their perceived aoclal distance feoa lower- 
class blacks. 

The "lihites rating whites" group (Group 5) is ranked Uilrd in order of 
accuracy of evaluation. Thia group contains asle as well as feaale teadters, 
whereas the groups ranked first and second in evaluation accuracy contained 
only feaale tsachers. It has been claiaed that white aale teachera are rela- 
tively diainterested in teaching* are relatively poorly 'aotivated. and often 
enter the teaching profession ^th the expectation of aoving on to adain^ra- 
tive positions. Uhy, thsn» does the "Miites rating whites" group rank third 
in evaluation accuracy. The caapositlon of tha group prdbably escplaina its 
hig^ rsnkingt since aore than ninety percent of the white teachers are feaale. 
the saall proportion of aale teachers in the group would only slightly diainiah 
tbs evaluation accuracy of a group priaarily coaposed of white feaale teachers. 

Group 1 (Teachers rating students of the saae race) ranked fourth In 
accuracy of evaluation. This intcraadiate position ai^t properly reflect a 
eoabination of eleaeata that would Mihance evaluation accuracy (the superior 

« 

training* interest* and aotivation of white feaale teadiers) and eleaents 
that would diainish evalustion accuracy (inferior Gaining of black teachers, 
relative lack of aotivation and disinterest of aale teachers). 

The "White aales rating blacks" group (Group 3) was also intera6diate in 
evaluation accuracy; its rank was five. According to the contentions cited 
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above* the group* Is co«p6sed of relatively disinterested tescheri who are 
evaluating students f roa whoa they feel le.^8 socially distant than do nlddle- 
dass black feaale teachers. .These offacttlng factors would sugge^ w Inter-, 
oedlate position on accuracy of evaluation. 

/sociological arguiMsnts explicit to the classrooa teachor role do not 
suggest the relative ordering observed, for groups 3, 7 and 8 (•'White nalcs 
rating blacks", "Teachers rating students of the sa»e race and sex"» and 
''Teachers rating students of the saw rac« but opposite sex", respfcctlvely) . 
However, on the. operational varl«d>le (proportion of variance accounted for 
by a single factor) used to define evaluation accuracy, these groups were 
separated by only elght-hundredths of one percent — a statistically inslg- 
nlf leant separation. 

I 

Multldiaensloaal Scaling 

The arguMnts used to explain the results on accuracy of evaluation found 
through factor analysis can, with two exceptions, be used to explain the results 
found through aultidlaensional scaling. Th« "Black females rating whites" 
group was found to ba second sost accurate In evaluation using the factor 
analysis procedure, liut was found to be fifth aiost accurate using the aultl- 
dlwnslonal scaling procedure. The results of the oultidiiBenslonal scaling 
analysis for this group are generally consistent with the original hypotheses 
of this study, but do not coqfora to the explai^atlon given above for the 
factor analysis results. The "Whites rating whites" group was found to be 
third aost accurate in evaluation using the factor analysis procedure, but was 
found to be the least accurate group using the aultidiaensioaal scaling pro- 
cedure. This result is not consistent with the orlglaal hypotheses of this 
study, nor caa it be explained by the teacher-specific sociology used to 
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espl^ain th« factor analysis results. The ftsgative value of "stress" obtained 
for this group dsfiss statistical eiplanatioii, although It is an allovable 
consequence of the jabkkaife procedure. The best estiaate of "stress" for the 
"Uhites rating whites" group is therefore in doubt. 

Coapsrison of Factor Analysis Results and Multidiasnsioosl Scaling Results 

The MMt consistent result of the two astkods of analysis was the nuaber 
one ranking on accuracy of evaluation of the "White fenales rating black 
students" group. Considering all of the groups, however* the two vathoda 
of aaslysis produced narked inconsistencies. The Spearaaa rank-ocder cor- 
relation between the two sets of rankings was .38. The Inovdiaately lew ranking 
of the "Whites rating white students" group in the nultidiaensioaa^ acallng 
analysis (a poaaible atatistieal artifact) nsterially reduced the rank-order, 
oorrelation between the two sets of raakinga. Had thia group been ranked iden- 
tically on accuracy of evaluation for both asthods of analyaia, the rank- 
order correlation between groups would have been .69* 

Because the results obtained for the "Whites rating white students" 
groop with the nultidinensioaal scaling analysis defy explanation, and because 
the esults obtained with factor analysis are supported by sociological t|eory , 
it is suggested thst greater confidence be pieced in the factor analysis 
results . 

CQMCLUDZMG REMARKS 

The results of this study suggest that such of the literature on the topic 

» 

of race and social distance nay not be applicable to claaarooa teachers. 
Ibis preliminary finding could be confirmed tbrough a direct Investigation 
of the social distance perceptions of classioom teachers • A study of the 

/ 
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eleaencs of teacher training curricula that affect social distance perceptions 
would also be worthwhile. 

It has also been suggested that social class wmj be a nore important 
factor than race in detervining social distance perceptions. Uhile this sug- 
gestion derives wt»V'A from the post hoc interpretation of the findings of this 
study than fron the data explored » it is consistent with a body of sociological 
literature. A study of the relative contributions of race and social class 
to the evaluation accuracy of dassroon teachers is now in progress. 
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TABLE I 

GROUPING OP TEACHERS AND PUPILS, INITIAL SAMPLE SIZES, AND SAMPLING FRACTIONS 



Group 

Teachers rating etadtnte 
of the eaae race 



Nin^er of C— ee Available froa Survey Saap ling Fraction 



Black feaale teachers 
rating white studcoits 

White aale teaehera 
rating black students 

White feaiie teachers 
rating black students 

White teachers rating 
vhlte students 

Black teachers rating 
black students 

Teachers rating students of 
the saae race and sex' 

Teachers rating students of 
the saae race but opposite 
sex 



23.262 



662 



1,375 



4,275 



14,069 



9,193 



11,346 



11»916 



1/20 



all cases used 



all cases used 



1/4 



1/10 



1/10 



1/10 



1/10 
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TABLE 2 

HYPOTHESIZED AND ACTUAL LISTING OF GROUPS IN DECREASING ORDER 
OP ACCORACT OP EVAUIAT1<»I. ACTUAL ORDERING BASED ON FACT(» 

ANALYSIS 



Group No. 
7 



8 



Hypothttslged Order Group No* 

Teacbers rating stadeatt of 4 
Che soae race aad sex 
Teachers rating students of 2 
the saas race 

Nhite teadiers rating white 5 
students 

Black teachers rating black 1 
students 

Teachers rating students of 3 
the saae race but opposite sex 
Black feaale teachers rating 7 



8 



ufaite students 

White Mle teadiers rating 

black students 



White feasle teachers rating 6 
black students 



Actual Order 
White feoale teachers 
rating black students 
Black feaale teachers 
rating, white students 
White teachers rating 
uhite students 
Teachers rating atudents 
of the ease race 
White male teachers 
rating black students 
Teachers rating students 
of the saae race and sex 
Teachers rating students 
of the saae race but 
opposite sex 
Black teathers rating 
black students 
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TABLE 3 

PROPORTIONS OF VARIANCE ACCOUNTED FOR BY FIRST PRINCIPAL COMPONENT IN VARIOUS 
TEACHER-STUDENT GROUPINGS, AND PAIRWISE TESTS OF BETWEEN-GROUP DIFFERENCES 



Group 

No» Group 



3 
4 
5 



Teachers rating students 
of the same race 

Black female teachers 
rating white students 

White male teachers rating 
black students 

Uhite feiaala teachers 
rating black students 

Uhite teachers rating 
white students 



Proportion of Proportion of Variance Standard 

Variance Accounted Accounted for by First Error cf 
for by First Principal Component Jackknife 

Principal Conyonenn (Jackknife Estimate) Estimate 



6 Black teachers rating 
black students 

7 Teachers rating students 
of the same race & sex 

8 Teachers rating students 
of the same race but 
opposite sex 



.395 
.389 
.417 
.375 

.383 

.434 
.420 

.415 



.3923 
.3863 
.4178 
.3720 

.3866 

.4367 
.4182 

.4186 



.0141 
.0173 
.0129 
.0112 

.0125 

.0065 
.0122 

.0141 



Groups 


t-istatistic 


Groups 


t-statlstic 


Groups 


t-statistlc 


Groups 


t-statisti 


1-2 


.269 


2-3 


-1.461 


3-5 


1.826 


4-8 


-2.581 


1-3 


-1.357 


2-4 


.694 


3-6 


-1.307 


5-6 


-3.877* 


1-4 


1.108 


2^5 


.146 


3-7 


- .023 


5-7 


-1.913 


1-5 


.316 


2-6 


-2.712 


3-8 


.042 


5-8 


-1.753 


1-6 


-2.828 


2-7 


-1.515 


4-5 


- .922 


6-7 


1.342 


1-7 


-1.388 


2-8 


-1.434 


4-6 


-4.992* 


6-8 


1.164 


1-8 


-1.315 


3-4 


2.676 


4-7 


-2.791 


7-8 


.021 



* significant at .05 level; 



critical value =3.82 
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TABLE 4 

HYPOTHESIZED AND ACTUAL ilSTING OF GROUPS IN DESCENDING ORDIiR 
OF ACCURACY OF EVALUATION, (BDEUNG BASED ON MULTIDmENSIONAL SCALING 



Group No« Hypothesized Order 

7 Teachers rating students of 

the same race and sex 

1 Teachers ratings students 

of the same race 



White teachers rating white 3 
students 



Group No. Actual Order 



8 



Black teachers rating 
black students 



8 



Teachers rating students of 2 
the sasie race but opposite 



White female teachers 
rating black students 

Teachers rating students 
of the same race 

White male teachers rating 
black students 

Teachers rating students 
of the same race but oppo- 
site sex 

Black female teachers rating 
white students 



sex 



Black female teachers ratings 7 
white students 

White male teachers rating 6 
black students 

White female teachers rating 5 



Teachers rating students of 
the same race and sex 

Black teachers rating 
black students 

White teachers rating white 



black students 



students 
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TABLE 5 

STRESS VALUES ?(M. ONE-DIMENSIONAL STUDENT RATINGS BY TEACHERS IN VARIOUS 
mCHER-STUDENT GROUPINGS » AND PAIRWISE TESTS OF BETWEEN-GROUP DIFFERENCES 



Group 
No. 



Standard Error 
Jackknife Estimate of Jackknife 



3 
4 
5 



8 



Group 

Teachers rating studoats 
of the sane race 

Black female teachers 
rating white students 

White male teachers 
rating black students 

White female teachers 
rating black students 

White teachers rating 
white students 

Black teachers rating 
black students 

Teachers rating students 
of the same race & sex 

Teachers rating students 
of the same race but 
opposite sex 



Stress of Stress 



Estimate 



.443 

.399 

.431 

.441 

.280 

.410 

.382 

.357 



.4097 
.2964 
.4094 
.4167 
-.0629^ 
.2804 
.2956 
.3255 



.0738 

.0455 

.0336 

.0387 

.1053 

.0908 

.0406 

.0268 



Groups 
1-2 


t-statlstic 
1.306 


Groups 
2-3 


t-statistic 
-1.998 


Groups 
3-5 


t-statistic 
4.273* 


Groups 
4-8 


t-stati 
1.938 


1-3 


.004 


2-4 


-2.014 


3-6 


.301 


5r6 


-2.584 


1-4 


.084 


2-5 « 


3.131 


3-7 


2.159 


5-7 


-3.175 


1-^ 


3.674 


2-6 


- .158 


3-8 


1.953 


5-8 


-3.574 


1-6 


1.105 


2-7 


.013 


4-5 


4.275* 


6-7 


.153 


1-7 


1.354 


2-8 


.551 


4-6 


1.381 


6-8 


.476 


1-8 


1.072 


3-4 


- .143 


4-7 


2.174 


7-8 


.614 



^ This value was treated as 0.0 in the calculation of ^atatlatics 
* Significant at the .05 level; critical value » 3.82 



